Cluster-based patent retrieval

نویسندگان

  • In-Su Kang
  • Seung-Hoon Na
  • Jungi Kim
  • Jong-Hyeok Lee
چکیده

Through the recent NTCIR workshops, patent retrieval casts many challenging issues to information retrieval community. Unlike newspaper articles, patent documents are very long and well structured. These characteristics raise the necessity to reassess existing retrieval techniques that have been mainly developed for structure-less and short documents such as newspapers. This study investigates cluster-based retrieval in the context of invalidity search task of patent retrieval. Cluster-based retrieval assumes that clusters would provide additional evidence to match user’s information need. Thus far, cluster-based retrieval approaches have relied on automatically-created clusters. Fortunately, all patents have manuallyassigned cluster information, international patent classification codes. International patent classification is a standard taxonomy for classifying patents, and has currently about 69,000 nodes which are organized into a five-level hierarchical system. Thus, patent documents could provide the best test bed to develop and evaluate cluster-based retrieval techniques. Experiments using the NTCIR-4 patent collection showed that the cluster-based language model could be helpful to improving the cluster-less baseline language model. 2006 Elsevier Ltd. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Patent Retrieval Method Using a Hierarchy of Clusters at TUT

To retrieve relevant documents from an enormous document collection, we usually utilize the similarity or distance measure between a query and the documents, or apply document clustering techniques to the document collection and partition it into relevant document groups. For patent retrieval, however, it is difficult to retrieve documents by using query terms only, because complex terminologie...

متن کامل

1 st International Workshop on Advances in Patent Information Retrieval ( AsPIRe ’ 10 ) Allan Hanbury

Patent Retrieval specialists in the 21st century face many challenges. They must search very large numbers of documents in multiple languages, expressing complex technological concepts through sophisticated legal clauses. Despite a great deal of theoretical development in Information Retrieval techniques and machine translation approaches, advanced search tools for patent professionals are stil...

متن کامل

PRIME: A System for Multi-lingual Patent Retrieval

Given the growing number of patents led in multiple countries, users are interested in retrieving patents across languages. We propose a multi-lingual patent retrieval system, which translates a user query into the target language, searches a multilingual database for patents relevant to the query, and improves the browsing e ciency by way of machine translation and clustering. Our system also ...

متن کامل

NTCIR-3 Patent Retrieval Experiments at ULIS

Given the growing number of patents filed in multiple countries, users are interested in retrieving patents across languages. We propose a multi-lingual patent retrieval system, which translates a user query into the target language, searches a multilingual database for patents relevant to the query, and improves the browsing efficiency by way of machine translation and clustering. Our system a...

متن کامل

PatMedia: Augmenting Patent Search with Content-Based Image Retrieval

Recently, the intellectual property and information retrieval communities have shown increasing interest in image retrieval, which could augment the current practices of patent search. In this context, this article presents PatMedia search engine, which is capable of retrieving patent images in contentbased manner. PatMedia is evaluated both by presenting results considering information retriev...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Inf. Process. Manage.

دوره 43  شماره 

صفحات  -

تاریخ انتشار 2007